Skip to content

Conversation

@Hermit-w
Copy link
Collaborator

@Hermit-w Hermit-w commented Nov 6, 2025

xLLM启动参数:

export PYTHON_INCLUDE_PATH="$(python3 -c 'from sysconfig import get_paths; print(get_paths()["include"])')"
export PYTHON_LIB_PATH="$(python3 -c 'from sysconfig import get_paths; print(get_paths()["include"])')"
export PYTORCH_NPU_INSTALL_PATH=/usr/local/libtorch_npu/
export PYTORCH_INSTALL_PATH="$(python3 -c 'import torch, os; print(os.path.dirname(os.path.abspath(torch.__file__)))')"
export LIBTORCH_ROOT="$(python3 -c 'import torch, os; print(os.path.dirname(os.path.abspath(torch.__file__)))')"
export LD_LIBRARY_PATH=/usr/local/libtorch_npu/lib:$LD_LIBRARY_PATH

source /usr/local/Ascend/ascend-toolkit/set_env.sh
source /usr/local/Ascend/nnal/atb/set_env.sh
# export ASCEND_RT_VISIBLE_DEVICES=0
#export ASCEND_RT_VISIBLE_DEVICES=4,5
export ASDOPS_LOG_TO_STDOUT=1
export ASDOPS_LOG_LEVEL=ERROR
export ATB_LOG_TO_STDOUT=1
# export ASDOPS_LOG_TO_FILE=1 
# export HCCL_BUFFSIZE=1024
export PYTORCH_NPU_ALLOC_CONF=expandable_segments:True
export NPU_MEMORY_FRACTION=0.98
export ATB_WORKSPACE_MEM_ALLOC_ALG_TYPE=3
export ATB_WORKSPACE_MEM_ALLOC_GLOBAL=1

export OMP_NUM_THREADS=12

export HCCL_CONNECT_TIMEOUT=7200

\rm -rf /root/atb/log/
\rm -rf /root/ascend/log/
\rm -rf core.*

MODEL_PATH="/export/home/lanliwei.1/models/models/DeepSeek-V3"
MASTER_NODE_ADDR="11.87.49.111:9590"
START_PORT=18999
START_DEVICE=0
LOG_DIR="log"
NNODES=16
WORLD_SIZE=16

export HCCL_IF_BASE_PORT=43439


for (( i=0; i<$NNODES; i++ ))
do
  PORT=$((START_PORT + i))
  DEVICE=$((START_DEVICE + i))
  LOG_FILE="$LOG_DIR/node_$i.log"
    /export/home/lanliwei.1/code/mla_xllm_customize/xllm/build/xllm/core/server/xllm \
    --model $MODEL_PATH \
    --port $PORT \
    --devices="npu:$DEVICE" \
    --master_node_addr=$MASTER_NODE_ADDR \
    --nnodes=$WORLD_SIZE \
    --node_rank=$i \
    --max_memory_utilization=0.8 \
    --max_tokens_per_batch=20000 \
    --max_seqs_per_batch=2000 \
    --block_size=128 \
    --enable_prefix_cache=false \
    --enable_chunked_prefill=false \
    --communication_backend="hccl" \
    --enable_schedule_overlap=true \
    --enable_mla=true \
    --ep_size=16 \
    --dp_size=4 \
    --enable_customize_mla_kernel \
    > $LOG_FILE 2>&1 &
done

使用如下脚本测试
benchmark.py

测试参数:

python3 benchmark.py \
 --backend xllm \
 --model /export/home/lanliwei.1/models/models/DeepSeek-V3 \
 --dataset-name random \
 --random-range-ratio 1 \
 --num-prompt 420 \
 --request-rate 2 \
 --max-concurrency 100 \
 --random-input 2048 \
 --random-output 2048 \
 --host 127.0.0.1 \
 --port 18999 \
 --dataset-path /export/home/lanliwei.1/dataset/ShareGPT_V3_unfiltered_cleaned_split.json \

性能对比:
image

[](const std::vector<RawForwardInput>& inputs) {
return std::all_of(
inputs.begin(), inputs.end(), [](const RawForwardInput& input) {
return input.flatten_tokens_vec.size() < 230;
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This magic number needs to be defined separately as constexpr, and the name should indicate what it is.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A constant variable has been added with some annotations.

@LMX-xin
Copy link
Collaborator

LMX-xin commented Nov 7, 2025

LGTM

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants